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A system for extracting information 
from a natural -language text 



/ 



The present invention relates to a system for extracting 
information from a natural -language text, with a view to 
selecting those words or those groups of words in the text 
which best describe the subjects addressed in the text. Such 
words or groups of words can be referred to as "keywords" and 
they are, in particular, usable for the purposes of indexing 
the text in a documentary database, in particular for 
automatically summarizing the text, for categorizing it, or 
for any other attempt at knowledge representation. 

-Known information extraction systems that attempt to 
achieve these objectives use analysis methods of the following 
three types: 

- statistical analysis methods that attempt to elect the 
most representative wordB of the text by counting their 
frequencies of occurrence, and by keeping only those whose 
frequency is neither too low nor to high; 

- thesaurus analysis methods which operate using a 
predefined representation of knowledge and which are based on 
prior definition of a structured reference lexicon or 
"thesaurus". Such a thesaurus is defined entirely manually and 
must be defined in each specialty field; 

- pattern-recognition analysis methods that operate using 
statistical identification of patterns. 

Comparative operation of those three types of analysis methods 
is illustrated below by analyzing the following text: 

"«Cats», l'une des com§dies musicales lee plus longtemps a l'affiche, va 
tirer sa reverence apres vingt et une amieea but la sc£ne londonienne , La 
dernidre representation de cette oeuvre d' Andrew Lloyd Webber aura lieu le 
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11 roai, jour de son 2le anniversaire, apres quelque 9 000 representations . 
L'annonce a #te faite trois jours apres la derniSre representation de 
^Starlight Express*, la seconde comSdie imisicale la plus longtemps a 
l'affiche & liondres, apris dlx-huit annees sur les planches - 

La fin de «cate» est un coup dur supplement aire pour le quartier de Covent 
Garden, oti sont regroupes la plupart des theatres londoniens, at qui a 
souffert d'une forte baisse de frequentation en 2001. Depuie 1981, annee 
de son lancement, la come'die musicale a, depuis, £t€ interpretee devant 
plus de 50 millions de spectateurs en n langues et dans 26 pays." 

(source: Reuter) 

The following is a translation into English of the above 
text : 

"Cats", one of the musical comedies that have been on the bills for the 
longest, is bowing out after twenty-one years on the London stage. The 
last performance of this work composed by Andrew Lloyd Webber will be 
given on May 11, its 21 9Z anniversary, after some 9000 performances. The 
announcement was made three days after the last performance of "Starlight 
Express* , the musical comedy that has been on the London bills for the 
second longest time, after eighteen years on the boards. 

The closure of "Cats" is a further blow 1 for the Covent Garden district, in 
which most of London's theaters are located, and which suffered a sharp 
downturn in ticket sales in 2001. Since 1961, the year it opened, this 
musical comedy has been performed for over 50 million spectators, in 11 
languages and in 26 countries. " 

(sourcei Reuters (translated into English from the French)) 

Operation of statistical analysis methods: 

if we consider their approaches in caricatured manner, 
statistical analysis methods count the words in the text and 
keep only those whose frequency is neither too low nor too 
high, while sometimes removing tool words (articles, 
prepositions, conjunctions, and verbal auxiliaries) in order 
to hone down the results. As regards the text proposed above, 
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the words of "medium" frequency (without taking the tool words 
into account) are then as follows: 

bills, years, Cats, comedy, last, longest, musical and 
performance - 

Although the main advantage of statistical analysis 
methods lies in their great algorithmic simplicity, their main 
disadvantage lies in the low degree of pertinence of their 
results. The words of "medium" frequency in a text are rarely 
the most representative. However, those methods can give 
better results on texts that are longer than the text given by 
way of example above. 

In addition, because the text is subdivided into words, 
i.e. into strings of characters whose delimiters are spaces, 
the semantic links that can link up words, e.g. the words 
"musical" and "comedy" are lost. 

Operation of thesaurus analysis methods: 

Those methods are based on prior definition of a 
structured reference lexicon or "thesaurus". As mentioned 
above, such a thesaurus is defined entirely manually and 
should be defined in each specialty field. 

For example, let us imagine the .following thesaurus: 

show -> comedy (comedies). drama 

-►musical -> Cats 

-> Jesus Christ Superstar 
erudite 

With that type of method, it is always possible to 
identify those words of the source text which are found in 
exactly the same form in the thesaurus. The advantage of that 
type of method is that it is possible to be sure that the 
identified words correspond to an established and listed 
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cultural or scientific reality. In addition, it is possible to 
deduce a federative word such as "show" which is not part of 
the initial text, but which characterizes it correctly. 
Unfortunately, the major drawback with such a method is that 
the thesaurus must be continuously updated so that it remains 
pertinent, which gives rise to high maintenance costs. Another 
major drawback with such a method lies in the fact that a 
thesaurus compiled for analyzing texts in the field of 
chemistry cannot be used for texts in the field of 
electronics, for example. In addition, when the thesaurus is 
not exhaustive, certain expressions that can be very pertinent 
are not recognized as being so. 

Operation of pattern- recognition analysis methods: 

Known pattern-recognition analysis methods are methods of 
statistically identifying patterns. Although they considerably 
improve the above-mentioned statistical analysis methods, by 
keeping trace of word pairing, such as, for example, the terms 
"musical" and "comedy" of the above example, they do not make 
it possible to analyze short texts correctly. Statistical 
methods need quantity in order to operate properly - 

For example, the keywords of the text given by way of 
example are obtained by rough comparison of sequences of 
various lengths. The tool words (the, etc.) do not count, and 
the sequences are formed on the basis of one word, plus three 
or less words: 

Cats 

Cats musical 

Cats musical comedies 

Cats musical comedies bills 

musical 

musical comedies 
musical comedies bills 
musical comedies bills longest 
comedies 



BSBS1W0CB- 16/01/05 



18/01 05 15:55 FAX 441 21 343 40 50 ABREMA 



+ WOODARD 



@034 



- 5 - 

comedies bills 
comedies bills longest 
comedies bills longest bowing 
et c«»» 

it then suffices to group together the various sequences 
obtained, by approximation on the form (e.g. "comedies" and 
"comedy"), and to count the mo9t frequent combined expressions 
such as "musical comedies". 

• « 

An object of the present invention is to propose a system 
for extracting information from a natural -language text, which 
system makes it possible to remedy the drawbacks of known 
analysis methods, by making it possible, in particular, for 
both short and long texts to be analyzed with good quality. 

This system uses an analysis method using pattern- 
identification analysis, and not. only statistical analysis, 
but also syntactical analysis. 

To summarize, the proposed system converts the words of 
the text into a succession of syntactical categories, and then 
compares subsets of the text with predefined syntactical 
patterns, so as to identify nominal groups without prejudging 
the importance of the words that make up said groups. 

Thus, the words "potatoes" or "power electronics" are not 
important in themselves, but rather they are important 
relative to the text in which they occur. In a general text, 
"power electronics" can be merely an example, and not a 
keyword of the text, whereas it is probably a keyword in a 
text dealing with transistors. It is the context that makes 
the keyword, and the system of the present invention includes 
a sort of syntactical context analyzer. Similarly, the word 
"bearing" may be recognized as being nominal in certain 
contexts because of its position relative to the other words 
in the text, or merely as a structural word in other texts > 
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A pattern- identification analysis method is proposed in 
Patent Document US 4,864,501. The method described in that 
prior document uses a dictionary containing base forms for 
encoding the words of the text with a view to identifying 
patterns. In addition to the fact that that dictionary is very 
voluminous because it contains several tens of thousands of 
entries, that method requires complex algorithms for 
retrieving the base forms of the words, which algorithms are 
specific to each language, in order to find words in the 
dictionary, and that method can require specific pref ix/suf f ix 
tables for coping with spelling mistakes, etc. Therefore, that 
method is very complex to implement and to use. 

The extraction system of the present invention makes it 
possible to remedy those drawbacks. 

To this end, the invention provides a method of extracting 
information from a natural -language text, by identifying 
patterns, in which method the words of the text are encoded by 
comparing them with the contents of a predefined lexicon 
containing a few tens of tool words, and nominal groups are 
then identified by searching subsets of the resulting 
succession of encoded words to look for groups of encoded 
words that comply with predefined syntactical rules ♦ 

The invention also provides a system for extracting 
information from a natural -language text, said system 
comprising: 

- an input unit for receiving said natural -language 



a lexicon file in which tool words are recorded; 
an analysis processor connected to said input unit, 
and to the lexicon file, and organized to act in a 
first stage to encode the words of the natural - 
language text by evaluating the grammatical function 
of each word by comparing it with the contents of said 
lexicon file of tool words, so as to identify the tool 
words in the text and so as to evaluate the functions 
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of the usage words which are not recognized as being 
tool words, by comparing their locations relative to 
the locations of the words recognized as being tool 
words, and, in a second stage/ to search subsets of 
the resulting succession of encoded words to look for 
groups of encoded words that comply with predefined 
syntactical rules, so ks to identify nominal groups; 
and 

- an output unit connected to said analysis processor 
for receiving the groups of encoded words recognized 
as being syntactical patterns. 

The extraction system of the invention evaluates the 
grammatical functions of the words of the text to be analyzed 
by means of a predetermined lexicon containing the few tens of 
tool words that are specific to each language and that are 
essentially constituted by articles, prepositions, 
conjunctions, and verbal auxiliaries. The functions of the 
other words are then deduced merely by means of the locations 
of the tool words. Since the tool words of a text commonly 
represent 40 % to 50 % of the words of the text, the number of 
said tool words is thus always high enough to enable the other 
words to be evaluated. Then, only those portions of the text 
whose grammar is identified aB being possible keywords are 
kept. 

The extraction system of the invention offers numerous 
advantages. In particular, the lexicon of tool words that is 
used by the system is incomparably smaller than the 
dictionaries containing thousands of words that are used by 
known systems. It should also be noted that no human 
intervention is necessary in order to determine the keywords, 
that the system can operate for texts in various languages, 
and that, apart from the lexicon of tool words, it does not 
require any other lexicon. In addition, since the semantic and 
grammatical values of the tool words are set and hardly ever 
change over several decades, maintenance of the lexicon is 
very limited. Conversely, the values of the other words that 
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might be termed "usage words" (verbs, nouns, adjectives), 
change continuously over time, as a function of usage, as a 
function of changes in trades or in sciences, or merely as a 
function of current affairs. Since the system of the present 
invention makes no presupposition about the values of usage 
words, it operates identically in all fields, be they 
literary, technical, or scientific, whereas systems using 
known methods must always be enriched with specialist lexicons 
that are often custom-made. Finally, this system of extraction 
makes it possible to addresp new languages incomparably more 
rapidly than any other prior system does. 

In addition, unlike systems using statistical analysis 
methods in which the frequency of occurrence of the words is a 
selection criterion, which assumes that the text is long 
enough, the system of the invention considers frequency of 
occurrence of words to be of only minor importance, and it 
operates both for long texts that are several tens of pages 

i 

long, and for short texts that are a few lines long. 

i 
: 

By way of example, a description follows of a system of 
the invention for extracting information from a natural- 

i 

language text, the description being given with reference to 

* 

the drawing, in which: ! 

- Figure 1 is a block diagram of the extraction system of 
the invention; and 

i 
■ 

•j 

- Figure 2 is a block diagram of the steps of an 
implementation of the method of the invention. 

■ 

Using a syntactical model requires the language of the 

i 

text under analysis to be recognized. This is thus naturally 
the first operation performed by the extraction system of the 
invention. Said language recognition can be based merely on 
statistical criteria of co-occurrence of letters. Recognizing 
languages, e.g. English, Spanish, French, Portuguese, German, 
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or Italian, makes it possible to guide the analyses that are 
to be performed dovmstream. 

The next step is a text -prof iling step that makes it 
possible to identify the lines of text (paragraphs) that 
include linguistic information, and to group paragraphs 
together. This operation is particularly useful for texts that 
are structured (with titles, subtitles, etc.), because it 
makes it possible to group paragraphs together consistently. 
Said operation is unnecessary for short texts. 

t 

The next step consists of a regularization operation for 
regularizing the text, during which amalgams of signs are 
removed, e.g. the typographic characters are separated from 
the alphabetic characters. For example, it is useful to 
recognize the chain "word," as being the term "word" followed 
by " , M , while the chain "1.5" should be recognized as being a 
numeral . 

In the text given by way of example, this step involves 
separating the typographical characters (e.g. n,w ; and 

».") from the other words by blank spaces. The text given by 
way of example then becomes: 



in French; 

"« Cats » , T une des com6dies musicales les plus longtemps d 1 1 afflche , va tirer 
sa revference apres vingt et une annSes sur la scfene londonienne . La demiere 
representation de cette oeuvre d ' Andrew Lloyd Webber aura lieu le 1 1 mai , jour 
de son 21 e anniversaire , apres quelque 9 000 representations . L ' annonce a ete 
faite trois jours apres la demfere representation de « Starlight Express » , la 
seconde com6die musicale la plus longtemps d 1 1 affiche a Londres , apres dix - 
hult ann6es sur les planches , 

La fin de « Cats » est un coup dur supplemental^ pour le quartier de Covent 
Garden , ou sont regroupes la plupart des th6§tres londoniens , et qui a souffert d 1 
une forte baisse de frdquentation en 2001 . Depuis 1981 , annee de son lancement 
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■ 

la comedie musicale a . depute , ete interprets devant plus de 50 millions de 
spectateurs en 1 1 langues et dans 26 pays . " 

» 
■ 

in English: . 

" u Cats" , one of the musical comeidies that have been on the bills for the longest , 
is bowing out after twenty-one yeajs on the London stage . The last performance of 
this work composed by Andrew Lloyd Webber will be given on May 1 1 , its 21 s ' 
anniversary , after some 9000 performances . The announcement was made three 
days after the last performance of j" Starlight Express " , the musical comedy that 
has been on the London bills for the second longest time , after eighteen years on 
the boards . | 

The closure of " Cats "tea furtherjblow for the Covent Garden district , in which 
most of London 1 s theaters are located , and which suffered a sharp downturn in 
ticket sales in 2001 . Since 1981 ,jthe year ft opened , this musical comedy has 
been performed for over 50 million spectators , in 11 languages and in 26 countries 



The next step, which jconstitutes a key step for the 
system, consists in determining the category of each word. By 
means of the limited lexicon" of tool words, the words of the 
text are encoded on the basis of the grammatical categories 
attributed as a function of the syntactical values of the 
words. Firstly, the tool words of the lexicon are recognized 
in the text, and then the functions of the other words in the 
text are deduced from their locations relative to the already- 
recognized tool words* 

Thus, if we adopt, for example, the following categories: 

s: structure word (tool word not useful for the remainder of 
the analysis) 

d: determiner (the, a, any, etc. [in French "le, la, les", 
etc J ) 

p! preposition (of, in, by, etc. [in French "de, en, par", 
etc.] ) 



6365TWOGB » 18/01/05 



18/01 '05 15;55 FAX +41 21 343 40 50 ABREMA 



-> WOODARD 



@040 



- 11 - 

4 : opening or closing sign 

1 or 2 : punctuation 

3 • apostrophe 

N: numeral 

W: proper noun 

w: common noun 

c: amalgam or contracted form (no occurrence in the English 

text [in French "du, des, au, aux\ etc.]) 
a: back- reference or anaphor (this, its, etc. [in French "ce, 

cet, ces" , etc.]) 
*: code attributed if none of the above categories are 

recognized 

The text in French that is given by way of example above 
thus becomes: 

4W242d3dcw3w4dw1 w2pd3w$2$w2aw4w2w1 pdw2pdw2w41 dw3w5paw2p3WWW 
w2w1 dNwl 2w1 pa*w52w2dNNw51 d3w3sw2w2dw1 w2dw3w5p4WW42dw3w3w4dw1 
w2pd3w3pW2w2d0dw2pdw21 dwl p4W4sdw1 w1 w5 p d w2 p W W 2 s 5 w3 d w2 c w2 w3 2 p& 
sw2p3dw2w2pw4pN1 W N 2 w2 p a w32 d w3 w4s 2 w2 2 w2 w4 w2 w1 p N w2 p w3 p N w2 p p N w1 1 

A next step consists in identifying the linguistic 
structures known as "nominal syntagma" in linguistics 
terminology or, more simply, as "nominal group". 

The entire set of syntactical ' patterns that are worth 
identifying constitutes the analysis grammar. Since the 
grammar for the French example is common to all of the Romance 
languages, it is possible to analyze a large number of 
languages by using the same extraction system of the invention 
without requiring any complex adaptation. 

By way of example, a (simplif ied) grammar can take the 
following form: 

(1) nominal syntagm -> determiner , nominal group ; w . 

(2) determiner -> d ; d , 3 ; numeral ; c ; a 

(3) d *the' ; ^sorne' ; 'any' / 'a' ; etc... 
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[in French: 
(3) d -> Ue' 



la 



les 



des 



9 . \ \ I 
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■ etc-] 



x its' ; etc.. 



(3bis) not suitable in English 

■ 

[in French: 

Obis) c -> 'du' ; 'auj: ; 'aux' ; etc.] 
(3ter) a -> 'this' ; 'theee[f ; 'that' ; 'those' 
[in French: " j. 

(3ter) a -> x ce' ; *cette' ; *ces' ; *son' ; etc-] 

(4) nominal group -> expression , nominal group , 

(5) expression -> w , p , «f i w . 

(6) p -> *from' ; *to' ; *fpr' / 'without' ; etc... 

[in French: 

i. 

(6) p -> x de' ; ; i-'pour' ; x sans' ; etc-] 

i' 



The arrow reads "is rewritten", the comma reads "followed 

! 

by", the semi- colon means a "or", and the period marks the end 
of the rule. Rule (1) reads, "nominal syntagm is rewritten 

i > 

determiner followed by nominal group". 



Rules (3) and (6) are said to be "terminal rules" because 
they make use of the lexical forms of the lexicon of tool 
words - 



i 

i. 



Rule (4) is a recursive rule- A nominal group can thus 
contain an infinite number! of expressions, which, according t. 

Rule (5) are either of thefwpw type or of the w type. 

f 

i 

The following successions of grammatical categories are 
thus recognized as being nominal . groups : 



d w 

d w p w 

d w w 

d w w p w 

d 3 w w 



B3B51W0GB- 18/01/05 



IS/01 '05 15: 56 FAX +41 21 343 40 50 ABKlMA 

Best Available Copy | 



+ WOODARD 



@042 



n 

13 - 



In the text given by way of example, the nominal groups 
identified by means of this .. grammar are underlined: 

» 
I 

t 

French version: j 

«Cats» I'une dee comedies musicales lee plus longtemps a l'affiche, va 
tirer sa rgvgrence apres vingt et une ann€es eur la scene londonienne. La 
derniere representation de cette j guvre &' Andrew Lloyd Webber aura lieu le 
ll mai, jour de son 21e anniversaire , apr^s quelquee 9 000 
representations . L' annonce a gte^faite trois jours aprSs la derniere 
representation de c Starlight Expirees^ , la seconde comgdie musicale la plus 
longtemps a l' affiche a Londres , j apres dix-huit annees sur lee planches. 

I 

La fin de «Cats» est un coup dur I supplementaire pour le quartier de Covent 
Garden, ou sont regroupes la plupart des theatres londoniens , at qui a 
souffert d'une forte baiese de fragmentation en 2001 . Depuia 1961, annee 
de son l ancement , la comedie musicale a, depuis, ete interpr€tee devant 
plus de 50 millions de spectateurs en 11 langues et dans 26 pays . 

(source Reuter) 



English Version: j 

" Cats " , one of the musical comedies that have been on the bills for the 
longest r is bowing out after twenty- one years on the London stage . The 
last perf ormance of this work composed by Andrew Lloyd Webber will be 

E. — J 

given on May ll, its 2 i ct anniversary , after some 9000 performances . The 

4 - 

announcement was made three days' ! after the last performance of " Starlight 

*\ 

il 

Express ", the musical comedy that has been on the London bills for the 
second longest time , after eighteen years on the boards. 

: 

The closure of " Cats " ie a furthfer blow for the Covent Garden district , in 
which most of London's theaters jare located, and which suffered a sharp 



downturn in ticket sales in 2001 1 '. Since 1981, the year it opened, this 
musical comedy haB been performed for over 50 million spectators , in 11 
languages and in 26 countries . 

(source » Reuters (translated into English from the French)) 



Since the nominal groups represent approximately 50 % of 
the text, it is necessary to keep only those for which the 
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probability of them being genuine keywords of the text is the 



highest 



ji 
1- 

I 



A next step can consist?! in filtering the nominal groups . 



Not all of the nominal groups have the same referential 
capacity. Some are more important than others. In order to 
determine which of them ar A the most important, the system of 



the invention evaluates the 1 '; importance of each nominal group 
as a function of two criteria, one of which is statistical, 
and the other is syntactical 

The statistical criterion: 

The most frequently occurring words of the nominal groups 
are classified in decreasing order of frequency (while taking 
account of approximations sruch as * comedy' = * comedies' ) , 
i.e., in the text given byljway of example: 



comedy 

musical 

bills 



Cats 
last 

performance 



3 
3 
2 
2 
2 
2 



Only those words whose];' occurrence exceeds 1 are kept in 



the list. The words remove^ thus have zero value. The value of 



each nominal group (initially set at 0) is added to the values 
of „the occurrences of the Words that it contains, minus 1. The 
values of the nominal groups thus become: 



musical comedy 
bills 

London bills 
Cats 
etc. 



:- i) 
i- i 



i 

2 > 1 

* 

I * 

2 i 



+ (3 - 1) = 

r 1 

: 1 - 
= 1 



i 

n 
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The syntactical criterion: 

■ 

When a nominal group is! or contains a proper noun, then 
the proper noun takes an additional point of value. Otherwise 
it takes 0. 

musical comedy 4 -if 0 - 4 

bills 1+0-1 
London bills 1+1=2 

1 + 1 = 2 



etc.- 

With this evaluation, it is easy to classify the nominal 
groups. In the text given by way of example, the nominal 
groups perceived to be the most important are double- 
underlined, the groups deemed to be of secondary importance 
are single-underlined, while the others are simply eliminated. 

■ 
1 

French version: 

« Cats » , l'une des comedies musicales les plus longtemps 5. l'affiche, va 
tirer sa reverence aprSs vingt et une annees sur la scene londonienne . La 
derniere representation de cette' cauvre d' Andrew Lloyd Webber aura lieu le 
11 mai, jour de son 21 e anniverBaire , apres quelgue 9 000 rep resentations - 
L'annonce a €te* faite trois jours apres la derniere representation de 
« Btarlight Express */ la seconde comSdie musicals la plus longtemps a 
l'affiche a Londrea, apr£s dix-huit am^es sur lee planches. 

La fin de « Cats » est un coup dur eupplementaire pour le quart ier de Covent 
Garden , ou sont regrouped la plupart des theatres londoniens , et qui a 
souffert d'une forte baisse de frequent at ion en 2001. Depuis 1981, annee 
de son lancement, la comedie musicale a, depuis, ete* interpretee devant 
plus de 50 millions de speetateurs en n langues et dans 26 pays. 

(source Reuter) 

' * 

English version: 

"Cats", one of the jmisical comedies that have been on the bills for the 
longest, is bowing out after twenty-one years on the London stage . The 
last performance of this work composed by Andrew Hoyd Webber will be 
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given on May 11. its ai* anniversary , after some 9000 performances . The 
announcement was made three days after the last performance of " Starlight 
Express , the musical comedy, that has been on the London bills for the 
second longest time, after eighteen years on the boards. 

The closure of "Cats" is a further blow for the Covent Garden district, in 
which most of London's theaters are loeated, and which suffered a sharp 
downturn in ticket sales in 2001. Since 1961, the year it opened, this 
nmaical comedy has been performed for over 50 million spectators , in 11 

languages and in 2€ countries. 

(sourcei Reuters (translated into English from the French)) 
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